Pandas 使用测试模块制作伪数据

在pandas中,有一个测试模块可以帮助我们生成半真实(伪数据),并进行测试,它就是util.testing

创建随机数据集

df = pd.util.testing.makeDataFrame()

创建随机日期索引数据集

df = pd.util.testing.makePeriodFrame()
df = pd.util.testing.makeTimeDataFrame()

创建随机混合类型数据集

df = pd.util.testing.makeMixedDataFrame()
>>> import pandas.util.testing as tm
>>> tm.N, tm.K = 15, 3  # 默认的行和列

>>> import numpy as np
>>> np.random.seed(444)

>>> tm.makeTimeDataFrame(freq='M').head()
                 A       B       C
2000-01-31  0.3574 -0.8804  0.2669
2000-02-29  0.3775  0.1526 -0.4803
2000-03-31  1.3823  0.2503  0.3008
2000-04-30  1.1755  0.0785 -0.1791
2000-05-31 -0.9393 -0.9039  1.1837

>>> tm.makeDataFrame().head()
                 A       B       C
nTLGGTiRHF -0.6228  0.6459  0.1251
WPBRn9jtsR -0.3187 -0.8091  1.1501
7B3wWfvuDA -1.9872 -1.0795  0.2987
yJ0BTjehH1  0.8802  0.7403 -1.2154
0luaYUYvy1 -0.9320  1.2912 -0.2907

创建随机个数的字符串

pd.util.testing.rands(n)
[pd.util.testing.rands(3) for _ in range(4)]
# ['crx', 'Z9k', 'KAU', 'mHz']

pd.util.testing.rand(n)

随机创建含有三个元素的数组

[pd.util.testing.rand(3) for _ in range(4)]

[array([0.72753901, 0.01316105, 0.12749649]),
 array([0.68111966, 0.30113528, 0.27567208]),
 array([0.55361183, 0.19179808, 0.89546358]),
 array([0.97648769, 0.94504658, 0.1353476 ])]

pd.util.testing.rands_array(nchars, size, dtype='O')

创建字符串矩阵

[pd.util.testing.rands_array(2,size=(2,2)) for _ in range(4)]


[array([['YX', 'u9'],
        ['MY', 'Fa']], dtype=object), 
 array([['Fa', 'CQ'],
        ['Kf', 'IR']], dtype=object), 
 array([['Lz', 'q8'],
        ['1S', 'WU']], dtype=object), 
 array([['EC', 'Jz'],
        ['2z', '6g']], dtype=object)]

pd.util.testing.randbool(iterable)

根据 iterable 的shape 创建 boolen 类型的 iterable

[pd.util.testing.randbool([1,2]) for _ in range(4)]

[array([[ True,  True]]),
 array([[False, False]]),
 array([[ True, False]]),
 array([[ True,  True]])]
Update time: 2020-05-25

results matching ""

    No results matching ""